Why is working with health data so difficult?

Thursday, Sep 28, 2023

Expecting common data standards to fix healthcare’s interoperability problems is too simplistic. It ignores the fact that there are many nuances to health data, and many of those nuances reflect critical information about the person and how the data was captured.

Health data is notoriously difficult to work with and many people blame it on the lack of common standards. This is based on the idea that shared standards create structural interoperability, which has been helpful in other industries. However, this approach ignores the fact that health data has many nuances. In fact, in healthcare structural interoperability does not translate into semantic interoperability.

“Structural interoperability” means systems agree on the structure of data, like field names and format. This is done by using schemas like FHIR, or by using semantic mapping technologies like JSON-LD. “Semantic interoperability” means systems agree on the meaning of data.

In many business domains, structural interoperability more or less guarantees semantic interoperability. Parties agree on data structures (schemas) or semantic mappings of those structures (JSON-LD), and how to codify data. For example, if systems exchange country information, they exchange well-defined unambiguous country codes. This way, they know what the other system is talking about, and they can render that information in the language of their users. This is not so for health data. It is not so easy to move from structural interoperability to semantic interoperability. There are several reasons for that.

Health data standards have to be definitive but flexible, so they allow for certain extensions. These extensions are used differently by different vendors. I worked with data from a vendor that contained source site information in an extension field, but another vendor had the same information somewhere else. There are also variations in the way data standards are used to respond to local needs. Add to this the interoperability issues between different versions of the same standard.

But there are even more reasons preventing semantic interoperability for health data.

Health data uses codifications extensively. There are multiple coding systems that describe concepts such as measurements, diagnoses, medicines, etc. And this is where things get interesting. Same concepts may have different codes in different coding systems. For example, both NDC and RxNorm codings systems are used to describe medications. When you search for persons who use, say, Aspirin, you have to search for all forms of Aspirin in NDC coding system and all forms of Aspirin in RxNorm coding system.

So, when you are working with health data, searching for a data value may mean comparing it against a curated set of possible values. These sets of values (usually called “valuesets”) define semantic equivalence classes. They may be different from use case to use case.

These coding systems are usually dynamic, new entries are added and existing entries are deprecated all the time. Data may contain deprecated codes or codes from mixed versions of coding systems. People even use non-standard local extensions that are meaningful only within an organization.

Measurements make things even more complicated. Consider a “height” data field. Say, you are searching for people who are taller than 180cm. If data coming from one source is in inches and data from another is in centimeters, you have to harmonize the units. In many cases, 6'2" is equal to 5'9". There are different ways of dealing with this. For instance, you can pick a unit and convert all height values to that. This is easy to do for “height”, but for other values, it may result in lossy transformations.

Issues related to data privacy add a new layer of complexity. In many domains, data privacy issues can be handled by masking certain data fields like the person’s name, contact information, or demographic information. In other words, privacy concerns can be addressed using structural means. For health data, the fact that a patient is using an antidepressant, or the fact that the patient visited a certain clinic may suddenly elevate the privacy levels of many related records. In other words, privacy concerns should be handled by semantic means in a contextual manner.

These are only some of the problems you have to face when working with health data.

I am not advocating that we should stop working on data standards and standard schemas. Structural interoperability is required for semantic interoperability. However, meaningful health data exchange cannot be achieved by simply constraining the structure of data.

Layered Schema Architecture is a solution designed for achieving semantic interoperability between disparate systems. It starts with a data schema (like FHIR) but extends it with overlays containing semantic annotations that describe how to harmonize data. We are using layered schemas to create knowledge graphs and common data models like OMOP from different types of health data while preserving the nuances captured at the source. The added bonus is that schemas and overlays are reusable and shareable, so multiple stakeholders can create a collaborative data ecosystem for meaningful data sharing.